Improving on the Naïve Bayes Document Classifier
ثبت نشده
چکیده
The Naïve Bayes document classifier has been used in many document classification algorithms [1], but is only really useful on a small subset of documents due to it’s many shortcomings [2]. By augmenting the basic functionality of the simple Naïve Bayes classifier, the classification algorithm can be applied to a much wider range of documents. This paper investigates the advantages which can be obtained by adding Feature Selection, Binary Independence, and the Multinomial model to the Naïve Bayes classifier.
منابع مشابه
Classification Using Naïve Bayes- a Survey
Classification, particularly Text Classification, is a supervised learning approach categorizing into various categories, the available training set of correctly identified observations analyzed into a set of features. There are many phases involved in classification. The main classification phase involves the use of classification algorithms or classifiers. Among the various classifiers, the N...
متن کاملA New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملIs Naïve Bayes a Good Classifier for Document Classification?
Document classification is a growing interest in the research of text mining. Correctly identifying the documents into particular category is still presenting challenge because of large and vast amount of features in the dataset. In regards to the existing classifying approaches, Naïve Bayes is potentially good at serving as a document classification model due to its simplicity. The aim of this...
متن کاملNaïve Bayes Classifier with Various Smoothing Techniques for Text Documents
Due to huge amount of increase in text data, its classification has become an important issue, now days. There are many good classification techniques discussed in this paper. Each classification method has its own assumptions, advantages and limitations. One of the most widely used classifier is Naïve Bayes which performs well with different data sets. Various Smoothing techniques are applied ...
متن کاملBayesian Model Averaging for Improving Performance of the Naïve Bayes Classifier
Feature selection has proved to be an effective way to reduce the model complexity while giving a relatively desirable accuracy, especially, when data is scarce or the acquisition of some feature is expensive. However, the single selected model may not always generalize well for unseen test data whereas other models may perform better. Bayesian Model Averaging (BMA) is a widely used approach to...
متن کامل